The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning

نویسندگان

Charles Mathy

Nate Derbinsky

José Bento

Jonathan Rosenthal

Jonathan S. Yedidia

چکیده

We describe a new instance-based learning algorithm called the Boundary Forest (BF) algorithm, that can be used for supervised and unsupervised learning. The algorithm builds a forest of trees whose nodes store previously seen examples. It can be shown data points one at a time and updates itself incrementally, hence it is naturally online. Few instance-based algorithms have this property while being simultaneously fast, which the BF is. This is crucial for applications where one needs to respond to input data in real time. The number of children of each node is not set beforehand but obtained from the training procedure, which makes the algorithm very flexible with regards to what data manifolds it can learn. We test its generalization performance and speed on a range of benchmark datasets and detail in which settings it outperforms the state of the art. Empirically we find that training time scales as O(DNlog(N)) and testing as O(Dlog(N)), where D is the dimensionality and N the amount of data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble learning with trees and rules: Supervised, semi-supervised, unsupervised

In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised, semi-supervised and unsupervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by post processing the rules with partial least squares regression have significantly better prediction performance than ...

متن کامل

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

Online Clustering with Experts

Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation ...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

Semi-supervised Online Multiple Kernel Learning Algorithm for Big Data

In order to improve the performance of machine learning in big data, online multiple kernel learning algorithms are proposed in this paper. First, a supervised online multiple kernel learning algorithm for big data (SOMK_bd) is proposed to reduce the computational workload during kernel modification. In SOMK_bd, the traditional kernel learning algorithm is improved and kernel integration is onl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning

نویسندگان

چکیده

منابع مشابه

Ensemble learning with trees and rules: Supervised, semi-supervised, unsupervised

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Online Clustering with Experts

Combining Classifier Guided by Semi-Supervision

Semi-supervised Online Multiple Kernel Learning Algorithm for Big Data

عنوان ژورنال:

اشتراک گذاری